A Foundation for Flow-Based Program Matching Using Temporal Logic and Model Checking

A Foundation for Flow-Based Program Matching Using Temporal Logic and Model Checking, Julien Brunel, Damien Doliguez, René Rydhof Hansen, Julia Lawall, Gilles Muller. POPL 2009.

Reasoning about program control-flow paths is an important functionality of a number of recent program matching languages and associated searching and transformation tools. Temporal logic provides a well-defined means of expressing properties of control-flow paths in programs, and indeed an extension of the temporal logic CTL has been applied to the problem of specifying and verifying the transformations commonly performed by optimizing compilers.

Nevertheless, in developing the Coccinelle program transformation tool for performing Linux collateral evolutions in systems code, we have found that existing variants of CTL do not adequately support rules that transform subterms other than the ones matching an entire formula. Being able to transform any of the subterms of a matched term seems essential in the domain targeted by Coccinelle. In this paper, we propose an extension to CTL named CTL-VW (CTL with variables and witnesses) that is a suitable basis for the semantics and implementation of the Coccinelle’s program matching language.

Our extension to CTL includes existential quantification over program fragments, which allows metavariables in the program matching language to range over different values within different control-flow paths, and a notion of witnesses that record such existential bindings for use in the subsequent program transformation process. We formalize CTL-VW and describe its use in the context of Coccinelle. We then assess the performance of the approach in practice, using a transformation rule that fixes several reference count bugs in Linux code

The Coccinelle tool is quite fun to play with. You write things that look like the output of patch, only with some extra patterns and boolean conditions in it, and the tool will go over your C source, find all the source code that matches it, and apply all the changes you've specified. It's open source and available online.

The theory described in this paper is quite fun, too -- the algorithms they describe are (surprisingly) not too complicated and apparently quite speedy.

Ensuring Correct-by-Construction Resource Usage by using Full-Spectrum Dependent Types

Ensuring Correct-by-Construction Resource Usage by using Full-Spectrum Dependent Types

Where it has been done at all, formally demonstrating the correctness of functional programs has historically focused on proving essential properties derived from the functional specification of the software. In this paper, we show that correct-by-construction software development can also handle the equally important class of extra-functional properties, namely the correct usage of resources. We do this using a novel embedded domain-specific language approach that exploits the capabilities of full-spectrum dependent types. Our approach provides significant benefits over previous approaches based on less powerful type systems in reducing notational overhead, and in simplifying the process of formal proof.

More ammunition for the importance of embedded domain-specific languages, dependent types, and correctness-by-construction.

C++ Futures

The next C++ standard is supposed to include futures in the libraries. Futures allow assignment to a value that is to be calculated at a later time - assigning a promise in lieu of the final value until such time as it becomes available. The current thread carries on until the value is actually needed, at which time the value is made available or a the thread goes into a wait state. Being a library facility, instead of a built-in language facility as in Oz or Alice-ML, the use of futures is not nearly as transparent. Still, one should be able to get a similar effect. In an article on Broken promises–C++0x futures, Bartosz Milewski criticizes the C++ implementation for it's lack of composability.

What was omitted from the standard, however, was the ability to compose futures. Suppose you start several threads to perform calculations or retrieve data in parallel. You want to communicate with those threads using futures. And here’s the problem: you may block on any individual future but not on all of them. While you are blocked on one, other futures might become ready. Instead of spending your precious time servicing those other futures, you might be blocked on the most time-consuming one. The only option to process futures in order of completion is by polling (calling is_ready on consecutive futures in a loop) and burning the processor.

My personal opinion is that what he is really saying is that concurrency through futures (which in the simple case would be a declarative form of concurreny) don't easily do message-based concurrency. I suppose one could look at CTM and describe how to do Erlang type concurrency with nothing more than dataflow variables. But I think the bigger problem is that we assume that there is a single approach to solving the concurrency problem. We might as well say that STM is good, but it fails to deliver declarative or message concurrency. I personally think languages, either through libraries or built-in language facilities, should provide multiple ways of dealing with concurrency and distribution. Although I don't use C++ anymore, I'm glad that they are integrating lightweight concurrency models into the language.

(Surprised we don't have a category dedicated to concurrency, so I'll post this under parallel/distributed.)

Equality Saturation: A New Approach to Optimization

Equality Saturation: A New Approach to Optimization, Ross Tate, Michael Stepp, Zachary Tatlock, Sorin Lerner, POPL 2009.

Optimizations in a traditional compiler are applied sequentially, with each optimization destructively modifying the program to produce a transformed program that is then passed to the next optimization.

We present a new approach for structuring the optimization phase of a compiler. In our approach, optimizations take the form of equality analyses that add equality information to a common intermediate representation. The optimizer works by repeatedly applying these analyses to infer equivalences between program fragments, thus saturating the intermediate representation with equalities. Once saturated, the intermediate representation encodes multiple optimized versions of the input program.

At this point, a profitability heuristic picks the final optimized program from the various programs represented in the saturated representation. Our proposed way of structuring optimizers has a variety of benefits over previous approaches: our approach obviates the need to worry about optimization ordering, enables the use of a global optimization heuristic that selects among fully optimized programs, and can be used to perform translation validation, even on compilers other than our own.

We present our approach, formalize it, and describe our choice of intermediate representation. We also present experimental results showing that our approach is practical in terms of time and space overhead, is effective at discovering intricate optimization opportunities, and is effective at performing translation validation for a realistic optimizer.

I thought this was one of the more interesting papers at POPL this year. The idea of tackling the phase ordering problem by splitting the problem into two steps --- first computing classes of equivalent programs, and second picking the best member of the equivalence class --- is very clever.

Denotational design with type class morphisms

Denotational design with type class morphisms. Conal Elliott.

Type classes provide a mechanism for varied implementations of standard interfaces. Many of these interfaces are founded in mathematical tradition and so have regularity not only of types but also of properties (laws) that must hold. Types and properties give strong guidance to the library implementor, while leaving freedom as well. Some of the remaining freedom is in how the implementation works, and some is in what it accomplishes.

To give additional guidance to the what, without impinging on the how, this paper proposes a principle of type class morphisms (TCMs), which further refines the compositional style of denotational semantics. The TCM idea is simply that the instance’s meaning is the meaning’s instance. This principle determines the meaning of each type class instance, and hence defines correctness of implementation. In some cases, it also provides a systematic guide to implementation, and in some cases, valuable design feedback.

The paper is illustrated with several examples of type, meanings, and morphisms.

To continue in our new all-Conal format... This paper brings together a bunch of things that Conal's been talking about lately, and "algebra of programming" fans will probably like his approach.

(I have a hunch that what he calls a "type class morphism" could be described using standard categorical jargon, but I haven't given it much thought. Suggestions?)

Dana

Luke Palmer and Nick Szabo can shoot me for this if they want, but I think this is warranted, and I want to connect a couple of dots as well. Luke is one of a number of computer scientists, with Conal Elliott probably being the best known, who have devoted quite a bit of attention to Functional Reactive Programming, or FRP. FRP has been discussed on LtU off and on over the years, but, unusually for LtU IMHO, does not seem to have gotten the traction that some other similarly abstruse subjects have.

In parallel, LtU has had a couple of interesting threads about Second Life's economy, smart contracts, usage control, denial of service, technical vs. legal remedies, and the like. I would particularly like to call attention to this post by Nick Szabo, in which he discusses a contract language that he designed:

Designing the contract language radically changed my idea of what program flow and instruction pointers can be. Its successor, a general-purpose programming language, may thereby make event-oriented programming and concurrency far easier. The language is targeted at GUI programming as well as smart contracts, real-time, and workflow programming. My answer to the problems of concurrency and event handling is to make the ordering of instructions the syntactic and semantic core of the language. The order of execution and event handlers are the easiest things to express in the language rather than kludged add-ons to a procedural or functional language.

In recent private correspondence, Nick commented that he'd determined that he was reinventing synchronous programming à la Esterel, and mentioned "Reactive" programming.

Ding!

To make a potentially long entry somewhat shorter, Luke is working on a new language, Dana, which appears to have grown out of some frustration with existing FRP systems, including Conal Elliot's Reactive, currently perhaps the lynchpin of FRP research. Luke's motivating kickoff post for the Dana project can be found here, and there are several follow-up posts, including links to experimental source code repositories. Of particularly motivating interest, IMHO, is this post, where Luke discusses FRP's interaction with garbage collection succinctly but nevertheless in some depth. Luke's most recent post makes the connection from Dana, which Luke has determined needs to have a dependently-typed core, to Illative Combinatory Logic, explicit, and offers a ~100 line type checker for the core.

I find this very exciting, as I believe strongly in the project of being able to express computation centered on time, in the sense of Nick's contract language, in easy and safe ways extremely compelling. I've intuited for some time now that FRP represents a real breakthrough in moving us past the Von Neumann runtime paradigm in fundamental ways, and between Conal Elliott's and Luke's work (and no doubt that of others), it seems to me that my sense of this may be borne out, with Nick's contract language, or something like it, as an initial application realm.

So I wanted to call attention to Luke's work, and by extension recapitulate Conal's and Nick's, both for the PLT aspects that Luke's clearly represents, but also as a challenge to the community to assist in the realization of Nick's design efforts, if at all possible.

A Machine-Checked Model for a Java-Like Language, Virtual Machine, and Compiler

G. Klein and T. Nipkow, A Machine-Checked Model for a Java-Like Language, Virtual Machine, and Compiler, ACM TOPLAS, vol. 28, no. 4, 2006.

We introduce Jinja, a Java-like programming language with a formal semantics designed to exhibit core features of the Java language architecture. Jinja is a compromise between realism of the language and tractability and clarity of the formal semantics. The following aspects are formalised: a big and a small step operational semantics for Jinja and a proof of their equivalence; a type system and a definite initialization analysis; a type safety proof of the small step semantics; a virtual machine (JVM), its operational semantics and its type system; a type safety proof for the JVM; a bytecode verifier, i.e. dataflow analyzer for the JVM; a correctness proof of the bytecode verifiers w.r.t. the type system; a compiler and a proof that it preseves semantics and well-typedness.

The emphasis of this work is not on particular language features but on providing a unified model of the source language, the virtual machine and the compiler. The whole development has been carried out in the theorem prover Isabelle/HOL.

This is a fairly lengthy article (clocking in at 77 pages), in part because it presents a wealth of technical detail. The authors state that the aim of the article "is to demonstrate the state-of-the-art in machine-checked language definitions."

For those who are curious, the Isabelle theories are available in the Archive of Formal Proofs.

Parameterized Notions of Computation

Parameterized Notions of Computation, Robert Atkey, JFP 2008.

Moggi's Computational Monads and Power et al's equivalent notion of Freyd category have captured a large range of computational effects present in programming languages. Examples include non-termination, non-determinism, exceptions, continuations, side-effects and input/output. We present generalisations of both computational monads and Freyd categories, which we call parameterised monads and parameterised Freyd categories, that also capture computational effects with parameters.

Examples of such are composable continuations, side-effects where the type of the state varies and input/output where the range of inputs and outputs varies. By also considering structured parameterisation, we extend the range of effects to cover separated side-effects and multiple independent streams of I/O. We also present two typed λ-calculi that soundly and completely model our categorical definitions — with and without symmetric monoidal parameterisation — and act as prototypical languages with parameterised effects.

Once you've programmed with monads for a while, it's pretty common to start defining parameterized families of monads -- e.g., we might define a family of type constructors for IO, in which the program type additionally tracks which files the computation reads and writes from. This is a very convenient programming pattern, but the theory of it is honestly a little sketchy: on what basis do we conclude that the indices we define actually track what we intend them to? And furthermore, why can we believe that (say) the monadic equational laws still apply? That's the question Atkey lays out a nice solution to. He gives a nice categorical semantics for indexed, effectful computations, and then cooks up lambda calculi whose equational theory corresponds to the equations his semantics justifies.

The application to delimited continuations is quite nice, and the type theories can also give a little insight into the basics of how stuff like Hoare Type Theory works (which uses parameterized monads, with a very sophisticated language of parameters).

On a slightly tangential note, this also raises in my mind a methodological point. Over the last n years, we've seen many people identify certain type constructors, whose usage is pervasive, and greatly simplified with some syntactic extensions -- monads, comonads, applicative functors, arrows, and so on. It's incredible to suggest that we have exhausted the list of interesting types, and so together they constitute a good argument for some kind of language extension mechanism, such as macros. However, all these examples also raise the bar for when a macro is a good idea, because what makes them compelling is precisely that the right syntax yields an interesting and pretty equational theory in the extended language.

Admin notes

Several changes in the discussion dynamics on LtU require attention, I think.

First, and most important, is that the core business of LtU - the home page blog items posted by contributing editors - have been dwindling. What is more, the discussion forum, whose goal was to support the blog, have become very active, and often only tangentially on topic for LtU.

I have received complaints from long time members about the declining quality of the discussions, and while I haven't done any serious comparative analysis I do know that I now regularly skip threads, especially highly active threads and threads that are dominated by only a few members. I think I am not alone in this. I encourage long time members to continue publicly pointing out when threads become un-LtU-like, and remind everyone that these notes should not usually serve as opportunity for meta-discussions about policy.

I urge the LtU contributing editors to make an extra effort in the next few months to post quality items to the home page. It seems that without more emphasis on the blog, LtU will lose its relevance. I call on all members to keep in mind the style and policies of LtU (do read the spirit page, if you haven't already), and move inappropriate discussions to other forums.

The second issue is - of course - spam. We are constantly bombarded by spam, most of which you don't see since me and Anton manage usually to block the spam accounts from posting, and when spam does slip through we delete it fairly quickly. Still, in the last few days some spam messages did appear on the site, and took awhile to delete, despite our continued efforts. I apologize for that.

In previous discussions of this issue it was decided not to require approval for all new messages (however, I do moderate posts from suspicious accounts). Spam does not survive a long time on LtU, since the site is active and threads that are posted to automatically appear in the tracker, and is thus not a great concern most of the time.

I don't think the last attack was enough of a problem for us to change our policies, or spend time writing code to support advanced spam handling (there are many more exciting things we can be doing for LtU). If someone wants to help with the process of identifying suspicious accounts, deleting spam, and blocking ips, please volunteer. Help with running LtU is always appreciated.

Lisp Conference, March 22-25

Registration for the International Lisp Conference 2009 is now open. For a quick summary, see the announcement posted by conference chair Dan Weinreb.

The keynote speakers may be of interest to LtU readers: Gerald Jay Sussmann, Shriram Krishnamurthi, and David Moon.

The conference is at MIT from March 22 to 25. Early registration is a very reasonable $210 ($75 for students).